Machine learning algorithm for predict the in-hospital mortality in critically ill patients with congestive heart failure combined with chronic kidney disease

Abstract Background The objective of this study was to develop and validate a machine learning (ML) model for predict in-hospital mortality among critically ill patients with congestive heart failure (CHF) combined with chronic kidney disease (CKD). Methods After employing least absolute shrinkage and selection operator regression for feature selection, six distinct methodologies were employed in the construction of the model. The selection of the optimal model was based on the area under the curve (AUC). Furthermore, the interpretation of the chosen model was facilitated through the utilization of SHapley Additive exPlanation (SHAP) values and the Local Interpretable Model-Agnostic Explanations (LIME) algorithm. Results This study collected data and enrolled 5041 patients on CHF combined with CKD from 2008 to 2019, utilizing the Medical Information Mart for Intensive Care Unit. After selection, 22 of the 47 variables collected post-intensive care unit admission were identified as mortality-associated and subsequently utilized in the development of ML models. Among the six models generated, the eXtreme Gradient Boosting (XGBoost) model demonstrated the highest AUC at 0.837. Notably, the SHAP values highlighted the sequential organ failure assessment score, age, simplified acute physiology score II, and urine output as the four most influential variables in the XGBoost model. In addition, the LIME algorithm explains the individualized predictions. Conclusions In conclusion, our study accomplished the successful development and validation of ML models for predicting in-hospital mortality in critically ill patients with CHF combined with CKD. Notably, the XGBoost model emerged as the most efficacious among all the ML models employed.


Background
Congestive heart failure (CHF) persists as a prominent contributor to morbidity and mortality worldwide, affecting over 23 million individuals [1].Concurrently, chronic kidney disease (CKD) is prevalent in CHF patients and is associated with an unfavorable prognosis in terms of global and cardiovascular mortality prognosis [2].According to pivotal CHF trials, the prevalence of CKD ranges from 32 to 50% [3].The prognosis for patients with CHF combined with CKD is notably grim, exacerbating as renal function deteriorates, ultimately leading to elevated mortality rates [4].Recent research underscores the importance of early identification of critically ill individuals at risk of rapid deterioration, with potential implications for improved clinical outcomes [5].Predictive models tailored to identify high-risk patients with CHF combined with CKD for in-hospital mortality offer a promising avenue for healthcare professionals to allocate resources more efficiently.This facilitates personalized interventions and intensified monitoring for those individuals most likely to benefit.Therefore, the development of accurate prediction models capable of reliably estimating an individual's survival prognosis holds significant potential for advancing therapeutic practice.
In leveraging substantial datasets, encompassing demographics, diagnoses, regularly measured values, and treatments from electronic health records, machine learning (ML) algorithms presents a promising avenue to mitigate mortality rates in critically ill patients with CHF combined with CKD.These sophisticated, data-driven strategies excel in handling high-dimensional data, model intricate relationships, and identifying vital predictors linked to outcomes.A growing body of evidence demonstrates that ML techniques outperform traditional models [6,7].ML approaches have gained prominence in disease prognostication, allowing clinicians, with well-constructed prediction models, to identify patients at high risk for poor outcomes, facilitating more timely interventions and yielding improved results [8][9][10].Notably, analyses of outcome prediction in patients with CHF combined with CKD are relatively scarce.Therefore, the objective of this research is to forecast in-hospital mortality rates among critically ill patients with CHF combined with CKD using the ML method.

Database introduction
The Medical Information Mart for Intensive Care IV (MIMIC IV) database stands as a thorough, de-identified clinical dataset, sanctioned by both the Beth Israel Deaconess Medical Center and the Massachusetts Institute of Technology [11].The necessity for individual patient consent and ethically informed consent declarations was waived, given that the study had no impact on clinical decision-making, and the anonymity of all patients in the database was maintained [12].The author XL successfully completed the protection of human research participants exam and secured a certificate authorizing access to the database (No. 35970146).

Study population
All patients within the MIMIC IV database diagnosed with CKD combined with CHF were included in this study.The diagnosis of CKD and CHF was relied on the International Classification of Diseases, Ninth Revision (ICD-9) and International Classification of Diseases, Tenth Revision (ICD-10) codes documented by hospital personnel during patient discharge (Supplementary Table S1).Only the first admission will be considered for patients with a history of multiple ICU admissions.Exclusions comprised patients below 18 years old and those with an ICU stay of less than 24 h.

Data collection
We used Navicat Premium software for data extraction from the MIMIC IV database.Taking into account all available parameters and utilizing clinical expertise, we selected 47 candidate variables based on association with outcomes.We collected age, sex, weight, and ethnicity as demographic information for this study.Comorbidities included cerebrovascular disease, rheumatic disease, chronic obstructive pulmonary disease (COPD), diabetes, peripheral vascular disease, myocardial infarction, peptic ulcer disease, liver disease, paraplegia, cancer, and acute kidney injury.The patient's CKD stage was also collected.We gathered initial values of vital sign data, including temperature, respiration rate, mean arterial pressure, heart rate, systolic blood pressure, and oxygen saturation, within 24 h of admission.For biochemical indices, we collected initial values within the first 24 h after admission for serum sodium, serum potassium, bicarbonate, serum chloride, serum calcium, serum glucose, serum creatinine, international normalized ratio, anion gap, blood urea nitrogen, white blood cell, platelets, hemoglobin, hematocrit, prothrombin time, and partial thromboplastin time.Blood urea nitrogen is a blood test that measures the level of urea nitrogen in the bloodstream.It is commonly used to assess kidney function.Elevated BUN levels can indicate kidney dysfunction or other medical conditions.Prothrombin time is a laboratory test that measures how long it takes for blood to clot.It is often used to assess the function of the coagulation (blood clotting) system and to monitor the effects of anticoagulant medications.We recorded the total amount of urine voided within the initial 24 h following admission to the ICU.Within the same time frame, we recorded medical treatments such as mechanical ventilation, vasopressors, and renal replacement therapy.In the initial 24 h post-admission, we assessed the first values of the sequential organ failure assessment (SOFA) score and the simplified acute physiology score II (SAPS II) as severity scores of illness.The SOFA score is a clinical tool used to assess the severity of organ dysfunction/failure in critically ill patients.It evaluates the function of six organ systems: neurological, renal, coagulation, hepatic, cardiovascular, and respiratory.SAPS II is a severity-of-illness scoring system used to predict the risk of mortality in critically ill patients.It takes into account several physiological parameters, age, and underlying comorbidities to estimate the probability of survival.The decision to use SOFA and SAPS II scores was based on their wide recognition and established utility in assessing disease severity in critically ill patients across various studies and clinical settings.These scoring systems offer a comprehensive evaluation of organ dysfunction and physiological derangements, allowing for a reliable quantification of disease severity.

Preprocessing of data
Missing values are common in the MIMIC IV database, and all variables in this study had missing values of less than 20% (Supplementary Table S2).We used multiple interpolation methods to fill in the missing data.The least absolute shrinkage and selection operator (LASSO) regression can construct a penalty function to obtain a finer model, which is a data downscaling algorithm that helps to filter out key factors affecting the results, improve model performance, and reduce overfitting.Therefore, we used LASSO regression to identify variables that may be associated with mortality.For the LASSO analysis, we utilized the entire dataset for model development and implemented cross-validation to optimize the tuning parameter (λ).To enhance the robustness of our model, we adopted a 10-fold cross-validation strategy, wherein the dataset was partitioned into 10 subsets.The LASSO analysis was then iteratively applied on each fold, with the λ parameter selected based on the minimization of cross-validated error.This approach ensures a comprehensive exploration of the regularization parameter space, ultimately leading to the identification of the optimal λ that maximizes the model's predictive performance.

Statistical analysis
Continuous variables in this study were presented as the median and interquartile range (IQR), and the Mann-Whitney test was employed to discern differences between groups owing to their non-normal distribution.Categorical variables were conveyed as numbers and percentages, with group comparisons conducted using either the chi-square test or Fisher's exact test, as appropriate.
In our analysis, we conducted the statistical analysis using a combination of Python (Version 3.9.12)and R software (Release 4.2.1, Foundation R for Statistical Computing).We utilized several Python and R software packages for data processing; Python software packages include pandas, NumPy, scikit-learn, XGBoost, SHapley Additive exPlanation (SHAP), and, Local Interpretable Model-Agnostic Explanations (LIME) and R software packages include glmnet and ROCR.Statistical significance was defined as a P value below 0.05.

Machine learning
All patients participating in this study were randomly divided into a training set (70%) and a validation set (30%).Six ML techniques: extreme gradient boosting (XGBoost), k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), decision tree and logistic regression were used to build and validate the model for in-hospital mortality risk.We calculated the accuracy, sensitivity, specificity, area under the curve (AUC), recall, precision, F1 Score, and Matthews correlation coefficient (MCC) of the models for evaluating the predictive performance of different ML models.The testing AUC values corresponding to the different models were compared using paired Delong's test.The calibration curve is plotted and used to compare the actual with the predicted mortality risk.Based on the AUC, our final candidate model was selected.Since SOFA and SAPS II scores are used as common tools to predict severity and prognosis in critically ill patients, we compared the predictive power of the final model with that of traditional scoring systems.The American Heart Association Get With The Guidelines-Heart Failure (GWTG-HF) risk score is a widely accepted in-hospital mortality risk stratification scoring system.This scoring system is calculated based on patient-related data, including age, systolic blood pressure, blood urea nitrogen, heart rate, serum sodium, COPD, and non-African American ethnicity.We also compared our final model to the GWTG-HF risk score.The SHAP value is a concept rooted in cooperative game theory.It is used to attribute a value to each feature in a prediction, indicating its contribution to the prediction outcome.In the context of ML, SHAP values provide a way to explain the output of a model by quantifying the impact of each feature on that output.We utilized the SHAP values to examine the significance of individual features affecting the model's output and to depict the relevant features influencing the mortality risks.LIME is a method designed to provide local explanations for the predictions of complex ML models.It aims to explain the predictions of any black-box model by training a simpler, interpretable model on a local subset of the data around the instance being explained.By generating and analyzing a dataset of perturbed instances, LIME facilitates the creation of local models that mimic the intricate model's behavior around specific cases.This study applies the LIME algorithm to fit the predictive behavior of the model to individuals.Finally, subgroup analyses were performed according to the presence of sepsis, diabetes, paraplegia, cancer, AKI and different CKD stages.

Participants
A total of 9377 participants with CHF combined with CKD were determined to be eligible; of these 9377 patients, 3637 were disqualified for non-first ICU admissions, and 1672 patients were excluded due to a length of stay in the ICU of less than 24 h.Finally, 5041 patients met this study's inclusion and exclusion criteria (Figure 1).The in-hospital death rate among ICU-admitted CHF combined with CKD patients was 18.5% (933/5041).Of these patients, 60.5% (3049/5041) were male, with a median age of 76.9 (IQR: 67.9-84.8)years.Diabetes (2742/5041, 54.4%), sepsis (2095/5041, 41.6%), and COPD (1874/5041, 37.2%) were the top three comorbidities.The demographics, comorbidities, vital signs, biochemical indices, urine output, medical treatments, and severity scores of illness of the patients are listed in Table 1.

Predictor selection
A total of 47 clinical variables were included in the LASSO regression, and Supplementary Figure S1A shows a plot of the regression coefficients for the model.Each curve represents one variable.At each different input, factors with nonzero coefficients and corresponding nonzero coefficients formed a LASSO model.The LASSO feature selection process is shown in Supplementary Figure S1B.We chose 10-fold cross-validation to further determine the optimal model.The cross-validation error of the model is minimized when λ = 0.0077.Ultimately, 22 variables were still significant predictors of death (Supplementary Table S3).Correlation coefficients between these variables are shown in Supplementary Table S4.

Model development and validation
The included patients were randomized into the training set (3528, 70%) and the validation set (1513, 30%), and no significant differences were observed in the variables between the two sets (Supplementary Table S5).We built six ML models (XGBoost, KNN, SVM, RF, decision tree and logistic regression) with 22 variables chosen by LASSO regression as input components.The XGBoost model has the highest AUC (0.837) in the validation set (logistic regression: 0.828; SVM: 0.737; KNN: 0.670; decision tree: 0.616; RF: 0.820) (Figure 2(A) and Supplementary Table S6).Supplementary Table S7 shows the AUC of the six models in the training set.Similarly, the XGBoost model outperformed the SAPS II (AUC: 0.752), SOFA (AUC: 0.766) score and GWTG-HF (AUC: 0.688) (Figure 2(B)).Table 2 displays the results of an evaluation of the AUC, accuracy, sensitivity, specificity, recall, precision, F1 Score, and MCC of these six ML models.Calibration plots for the six ML models are shown in Supplementary Figure S2.

Model explainability
Utilizing SHAP values, our goal was to elucidate the mortality prediction mechanism employed by the XGBoost model.Figure 3 illustrates the feature importance ranking of the XGBoost model through SHAP summary plots, highlighting SOFA score, age, SAPS II, and urine output as the four primary contributors to the model.To provide more detailed information about Figure 3, we provide dependence plots of the top four most weighted clinical features output by the XGBoost prediction model to show the relationship between the feature values and the SHAP values of the features (Figures 4 and 5).
The LIME method was then used for two random samples from the validation set to provide insight into the individual mortality forecast.The case of death, as reported by the LIME algorithm, is depicted in Figure 6(A).According to the XGBoost model, 98% was the estimated probability of death.A SOFA score of 10, SAPS II of 75, age of 88.51 years, urine volume of 2 mL, anion gap of 32 mEq/L, and PTT of 74.8 s were all associated with an increased risk of death in the XGBoost model.The lack of a history of cerebrovascular disease or liver disease was discovered to lessen the probability of mortality.For this case, both the XGBoost model and the actual outcome were death.Similarly, Figure 6(B) depicts

Subgroup analyses
Subgroup analyses for the presence or absence of sepsis, diabetes, paraplegia, cancer, AKI and different CKD stages showed that the sustained robustness of the XGBoost model in predicting mortality among these patients.Comprehensive results can be found in Supplementary Figure S3.

Discussion
This study involved the development and validation of six models, incorporating 22 clinical factors, to predict in-hospital mortality among critically ill patients with CHF combined with CKD.Notably, the XGBoost model surpassed other models (KNN, SVM, RF, decision tree and logistic regression) as well as traditional risk scores (SAPS II, SOFA score and GWTG-HF) in predicting death in critically ill patients with CHF combined with CKD.Analysis of feature importance revealed that the SOFA score, age, SAPS II, and urine volume constituted the top four features with the most significant impact on the XGBoost model's prediction of in-hospital mortality.In addition, we describe how these factors influence the XGBoost model.These insights contribute to a comprehensive understanding of ML models for predicting in-hospital mortality in critically ill patients with CHF combined with CKD.More than one million primary and roughly three million secondary hospital admissions occur annually in the United States due to heart failure (HF), a condition linked with a high mortality risk and significant morbidity [13][14][15].Thus, it significantly burdens impacted people and global healthcare systems.HF frequently coexists with various prognosis-relevant comorbidities and directly affects other organs, such as the kidneys.The progression of HF or kidney illness might negatively impact patient outcomes by activating vicious cycles  that frequently accelerate cardiac and renal deterioration [16,17].A study conducted in the United States discovered that hospitalization rates for HF were high among patients with CKD and that individuals with CKD combined with HF had an increased risk of CKD progression and death [18].To mitigate mortality, it is necessary to establish and advocate for predictive models that can precisely and promptly identify patients at a heightened risk of clinical deterioration.
In our comparative analysis, the p-value for the difference in AUC between the XGBoost and Logistic regression models was not statistically significant.However, it is crucial to note that the selection of an optimal model goes beyond statistical significance.The practical effectiveness of XGBoost in predicting in-hospital mortality for critically ill patients with congestive heart failure combined with chronic kidney disease is evident in several aspects.XGBoost excels in capturing complex, nonlinear relationships within the dataset, a vital consideration given the intricate nature of critically ill patients.Additionally, the model's interpretability is enhanced through the use of SHAP values and the LIME algorithm,  patients with HF [19].Hu et al. found that XGBoost outperformed RF, naive bayes, decision trees, logistic regression, KNN, and SVM in predicting in-hospital mortality among critically ill patients with acute kidney injury [20].As per a meta-analysis, XGBoost demonstrated superior performance in predicting acute kidney injury compared to other ML techniques, including bayesian networks and SVM [21].Moreover, conventional severity scoring methods, including the SAPS II, SOFA score and GWTG-HF, exhibited subpar performance compared to ML models.This suggests that traditional scoring tools may not be reliable for predicting mortality in critically ill patients with CHF combined with CKD.While the SAPS II, SOFA score and GWTG-HF are capable of estimating the likelihood of adverse outcomes in critically ill patients, their exclusion of a significant number of pertinent parameters in their studies may lead to less accurate predictions compared to multivariable models [22].Prior research has indicated that the SAPS II, SOFA score and GWTG-HF have inferior prediction ability compared to ML models [6].
In this study, the ML algorithm was used for the first time to predict in-hospital mortality in critically ill patients with CHF combined with CKD.In critically ill patients with CHF combined with CKD, a complicated XGBoost model revealed that SOFA score, age, SAPS II, and urine output were most strongly linked with mortality.The SOFA score is a tool that describes the presence of organ dysfunction [23].It assigns a daily score between 1 and 4 to each of the six organ systems based on the severity of dysfunction: respiratory, circulatory, renal, hematologic, hepatic, and central nervous systems [24].The association between SOFA scores and clinical outcomes was high [25].Similarly, in the present investigation, the SOFA score had the maximum weight in the XGBoost model.It was determined to be the most significant predictor of mortality in critically ill patients with CHF combined with CKD.Age is a significant risk factor for mortality in critically ill patients with CHF combined with CKD.Numerous studies have demonstrated that aging is associated with an increased risk of death in critically ill patients with CHF combined with CKD.In our investigation, the median age of non-survivors was older than the median age of survivors.SAPS II is also a significant predictor of mortality.The SAPS II includes seventeen variables, and higher total scores are indicative of greater illness severity [26].Prior studies have established an association between SAPS II and an elevated mortality rate among ICU patients [27].In addition, we discovered a correlation between urine output and mortality in critically ill patients with CHF combined with CKD.Oliguria is a prevalent condition among ICU patients and represents the primary cause of renal parenchymal damage [28].Numerous studies have illustrated a correlation between reduced urine output and unfavorable outcomes in critically ill individuals [28,29].
However, this study also has some limitations.First, due to its retrospective design, this can lead to unavoidable selection bias.Second, different comorbidities may somewhat mask outcomes in patients with CKD and CHF.Third, the current study is a single-center study, and the results may not be extrapolated to other centers.In addition, prospective and multicenter studies are needed to validate this study's findings further.

Conclusions
In conclusion, ML models emerge as dependable tool for mortality prediction in critically ill patients with CHF combined with CKD.Among all the prediction models, the XGBoost model stands out as the most effective, offering clinicians a valuable tool for accurately management and timely interventions to mitigate mortality risks in critically ill patients with CHF combined with CKD who are at high risk of death.
Notably, among all the predictive models, the XGBoost model stands out as the most effective, offering clinicians a valuable tool for accurate management and timely interventions to mitigate mortality risks in critically ill patients with CHF combined with CKD who are at elevated risk of death."

Figure 1 .
Figure 1.The flowchart of patient selection.abbreviations: CHF: congestive heart failure, CKD: chronic kidney disease, MiMiC iV: Medical information Mart for intensive Care iV, iCu: intensive care unit.

Figure 2 .
Figure 2. ROC Curves for predicting the incidence of in-hospital mortality with Ml models and the traditional severity of illness scores.a ROC curves of six Ml models for predicting in-hospital mortality; B ROC curves for the traditional severity of illness scores predicting in-hospital mortality.abbreviations: ROC: receiver operating characteristic, SVM: support vector machine, Knn, k-nearest neighbors, auC: area under the curve, SOFa: sequential organ failure assessment, SaPS ii: simplified acute physiology score ii.

Figure 3 .
Figure 3.The important features derived from the XGBoost model.Ranking of feature importance indicated by SHaP.The matrix plot depicts the importance of each covariate in the development of the final predictive model.abbreviations: SHaP: SHapley additive explanation, SOFa: sequential organ failure assessment, SaPS ii: simplified acute physiology score ii, PTT: partial thromboplastin time, Bun: blood urea nitrogen, PT: prothrombin time, SpO 2 : oxygen saturation, MaP: mean arterial pressure.

Figure 4 .
Figure 4. SHaP summary plot of the features of the XGBoost model.The higher the SHaP value of a feature, the higher the probability of death development.each line represents a feature, and the abscissa is the SHaP value.Red dots represent higher feature values, and blue dots represent lower feature values.abbreviations: SHaP: SHapley additive explanation, SOFa: sequential organ failure assessment, SaPS ii: simplified acute physiology score ii, PTT: partial thromboplastin time, Bun: blood urea nitrogen, PT: prothrombin time, SpO 2 : oxygen saturation, MaP: mean arterial pressure.

Figure 5 .
Figure 5. SHaP dependence plot of the XGBoost model.SHaP values for specific features exceed zero, representing an increased risk of death development.abbreviations: SHaP: Shapley additive explanation, SOFa: sequential organ failure assessment, SaPS ii: simplified acute physiology score ii.

Table 1 .
Demographic and clinical characteristics at baseline.survival case with the LIME method.According to the XGBoost model, the probability of mortality was 3%.An age of 85.39 years, and a respiratory rate of 24 breaths/min in the XGBoost model was associated with an increased risk of death.In contrast, a SOFA score of 2, SAPS II of 34, PTT of 27.1 s, and the absence of paraplegia, liver disease, or cerebrovascular disease reduced the risk of death.The XGBoost model for this patient predicted survival, and survival was also the actual outcome.

Table 2 .
Performance comparison of the six models in the testing set.